Development of sensor system and data analytic framework for non-invasive blood glucose prediction

Periodic quantification of blood glucose levels is performed using painful, invasive methods. The proposed work presents the development of a noninvasive glucose-monitoring device with two sensors, i.e., finger and wrist bands. The sensor system was designed with a near-infrared (NIR) wavelength of 940 nm emitter and a 900–1700 nm detector. This study included 101 diabetic and non-diabetic volunteers. The obtained dataset was subjected to pre-processing, exploratory data analysis (EDA), data visualization, and integration methods. Ambiguities such as the effects of skin color, ambient light, and finger pressure on the sensor were overcome in the proposed ‘niGLUC-2.0v’. niGLUC-2.0v was validated with performance metrics where accuracy of 99.02%, mean absolute error (MAE) of 0.15, mean square error (MSE) of 0.22 for finger, and accuracy of 99.96%, MAE of 0.06, MSE of 0.006 for wrist prototype with ridge regression (RR) were achieved. Bland–Altman analysis was performed, where 98% of the data points were within ± 1.96 standard deviation (SD), 100% were under zone A of the Clarke Error Grid (CEG), and statistical analysis showed p < 0.05 on evaluated accuracy. Thus, niGLUC-2.0v is suitable in the medical and personal care fields for continuous real-time blood glucose monitoring.

www.nature.com/scientificreports/were not considered in this study, which may led to bias.The model is trained on blood glucose levels between 103 and 175 mg/dL, whereas it is recommended that the model must be trained with higher glucose levels for calibration and sensitivity.Various standards and evaluations were performed to validate the device.
A multisensory system was developed by employing an NIR wavelength of 1370 nm and 1640 nm and radiofrequency sensor between 36.50 and 41.50 GHz.A random forest (RF) algorithm was employed to achieve a Root Mean Square Error (RMSE) of 21.06 mg/dL, Mean Absolute Relative Difference (MARD) of 7.31%, and 96% under clinically acceptable zones A and B in the CEG 28 .The performance of the device is a limitation of this study, which can be improved in terms of accuracy and reduction in error for real-world deployment of the sensor.The model can be trained by considering fasting, postprandial, and random blood glucose samples collected from diabetic and non-diabetic volunteers to observe the performance of the model.
A multiple photonic-band near-infrared (mbNIR) sensor with a Shallow Dense Neural Network (SDNN) was proposed.A sensor with six 850 nm emitters and detectors was employed, which achieved an accuracy of 97.8% with a precision of 96.0%, sensitivity of 94.8%, and specificity of 98.7%.The detection limit of the proposed device was 60-400 mg/dL with a prediction error of ± 15 29 .Although the error is limited by the International Standard ISO, it can still be reduced and the accuracy can be improved for precise prediction during the practical deployment of the sensor.
In a similar study on detecting hemoglobin (Hb), blood glucose, and Creatinine (Cr), a PPG signal was acquired from a fingertip video.A source and detector at 850 nm for Hb, 950 nm for blood glucose, and 1150 nm for Cr were employed.A Deep Neural Network (DNN) is applied, which achieved an accuracy of 90.2% for blood glucose, 92.2% for hemoglobin, and 96.9% for creatinine 30 .The process of detecting blood glucose is a major limitation as it is non-portable, and deploying the application on different mobile phones can lead to errors in readings owing to different camera resolutions.
In an experimental trial, reflection spectroscopy between 1100 nm and 1825 nm was employed, where an SEP of 36.6 mg/dL and MARD of 23% were achieved, which is a limitation of the study for practical deployment 31 .Moreover, noninvasive measurements are taken from the lip, which can lead to infection if not sterilized properly and cannot be made as a portable device.
A parallel study employed two 940 nm LED sources, one 940 nm LED detector, and two 1300 nm detectors.Multiple polynomial regression (MPR) was applied with an Average Error and MARD of 6.09 % for capillary and 4.88 % and 4.86 % for serum glucose, respectively 32 .The size of the prototype can be reduced, which is a limitation of this study, along with a reduction in error for accurate prediction.
A sensor size of 15 × 15 mm 2 wearable band-type system was developed by employing four emitter LEDs with wavelengths of 950 nm, 850 nm, 660 nm, and 530 nm and a transmitter of 400-1100 nm.Pulsatile signals were recorded to avoid a high SNR and baseline wander, in the resting position.The average correlation coefficient R p of 0.86, and SPE of 6.16 mg/dL were obtained, which is a limitation of the study for practical deployment of the sensor.The reliability of the device was tested by comparing the heartbeat between PPG and Electrocardiography (ECG) signals, and by investigating changes in blood glucose levels in a day 33 .
The research gaps identified in the literature are addressed as follows.
• Non-portable [25][26][27][28][29][30][31][32] and wearable devices 24,33 did not focus on ambiguities such as skin color variation, ambi- ent light, pressure of the finger on the sensor, and reliability, making the device unsuitable for continuous monitoring of blood glucose accurately.• The proposed methodologies have a high MAE/MARD/prediction error, which makes the device non-rep- licable using invasive or minimally invasive methods.• The devices have been tested on normal patients 25,28,30,33 , non-diabetic patients with chronic health disorders 24,29 , and a few diabetic patients 26,27,31,32 .• The cost of the developed prototypes in the existing literature is estimated from a minimum of $100 to $300, which is not suitable for continuous monitoring of blood glucose, is non-portable, and is non-reliable with a higher error in predictions.
An accurate, portable, and low-cost sensor system is needed to handle ambiguities such as skin color variation, ambient light, and pressure on the sensor for predicting blood glucose levels.Extending the existing literature and overcoming the challenges of commercial devices, our proposed work expands the existing methods and commercially available devices by developing a reliable prototype integrating Artificial Intelligence (AI) and Data Science to develop a data analytic framework.The proposed work was designed to handle skin color variation, the presence of ambient light, and pressure on the sensor.Machine Learning (ML) models are applied to validate the developed framework that achieved maximum accuracy compared to the existing literature.A high degree of accuracy implies its application in better diabetic management at a low cost.The ease of use of this prototype is an additional advantage.
The contributions of this proposed study are as follows.
• Development of a low-cost ($87.37)NIR spectroscopy-based noninvasive portable finger and wrist sensor prototype to detect blood glucose levels continuously.• A novel data analytic framework was designed to improve the accuracy from 71% for niGLUC 1.0v (first version) to 99.96% for the proposed device, niGLUC 2.0v.• The accuracy of the developed sensor system was tested for its reliability and stability in the presence of skin color variations, ambient light, and finger pressure.
The rest of the paper is presented in the following consequential manner: Sect. 2 presents the methods for selecting measurement sites, the principle of blood glucose measurement, the hardware architecture of niGLUC-2.0v,cost comparison of niGLUC-2.0vwith commercial devices, development of a data analytic framework for niGLUC-2.0v,overcoming the challenge of skin pigmentation, predictive analysis, experimental design, data collection, testing the accuracy of niGLUC-2.0vfor variation in ambient light, and handling other ambiguities.The results and discussion are presented in Sect.3, where predictive analysis, validation of niGLUC-2.0v,Bland-Altman plot, CEG analysis, statistical analysis, and comparison with recent research works are covered.The paper ends with Sect.4, an exposition of the conclusion.

Methods
The current section elaborates on the selection of measurement sites, the principle behind light absorption, and the hardware architecture.An adjustment factor was proposed to handle skin color variation.The adjustment factor was evaluated for different intervals and multiplied by the voltages generated from niGLUC-2.0v.The dataset was fed into the data analytic framework, where EDA was applied and achieved the highest accuracy.The proposed data analytic framework improves the accuracy of the developed device in the presence of ambiguities.A cost comparison of niGLUC-2.0vwas performed using commercially available devices.ML algorithms and metrics evaluation are presented to validate the developed device.The experimental procedure and data collection are elaborated with the hardware setup.

Selection of measurement sites
Noninvasive blood glucose measurements from the lips, cheeks, tongue, eyes, earlobe, fingertip, and wrist have been reported in the literature [34][35][36][37][38][39][40] .The fingertip and wrist have thin skin folds and are a source of blood vessels i.e., capillaries where blood glucose can be easily found, which lie much above the fat layer of the skin.NIR requires thin skin folds and has the property of transilluminance i.e., when NIR light passes through skin and tissues, it penetrates the underlying structures such as blood vessels where absorption takes place.Information-rich spectral intervals are found in the first overtone and combination-band vibrations 31,41,42 .Based on the biophysical properties, better absorption properties of NIR, and to avoid the risk of infection, measurement alterations and, ease of handling the device, fingertip, and wrist were chosen in the proposed work.

Principle of blood glucose measurement-Absorption physics at Near-Infrared region
The absorption of light by the blood glucose molecules (C 6 H 12 O 6 ) is due to the overtone and combination bands, which cause photons to absorb and induce molecular vibrations.These vibrations are due to covalent bonds, which behave like springs through bending and stretching.Stretching of the CH and OH bonds was observed in this region.Molecules vibrate and absorb when the frequency of light matches the vibrating frequency [43][44][45] .This absorption is described by the Beer-Lambert law, as illustrated in Fig. 1.
According to the Beer-Lambert law, the absorbance of any solution is proportional to its concentration and the path length traveled by light rays 46 .When the blood glucose concentration is high, the absorbance of photons by blood glucose molecules is high with decreased scattering and a shorter optical path 47 .The principle of blood glucose measurement is written as, where R = is the reflected light intensity, R 0 = is the incident light intensity, l = is the length of the optical path inside the tissue, and ( µ eff ) = is the effective attenuation coefficient with respect to the absorption and reduced scattering coefficients.The effective attenuation coefficient is written as,  www.nature.com/scientificreports/where ε = molar extinction coefficient, C = tissue chromophore concentration, μ s ′ = reduced scattering coefficient, a = average of the cosine of the scattering angles.
From Eq. 1, it can be inferred that the glucose molecules absorb the light produced by the NIR emitter, and the reflected light is measured at the detector as voltage.The absorption, scattering, and transmission of light through the sample depended on glucose concentration.

Rationale behind selection of sensor
The sensor was selected on the basis of the wavelength and penetration depth of the skin.Capillary loops consisting of blood glucose molecules are present in the dermal layer of the skin at a 2.0 mm depth, which is easily penetrated by NIR sensors 48 .From the literature, glucose absorption peaks are found at 660 nm, 940 nm, 1550 nm, and 1650 nm, where the penetration depth of light is highest at 940 nm 28,32,49,50 .Below 700 nm and above 950 nm, the penetration of light is challenging owing to the strong absorption from hemoglobin and water molecules.The penetration depth increases to 900-1000 nm and then decreases [48][49][50][51] .At 940 nm, attenuation by other constituents of the blood, such as water, hemoglobin, and melanin, is minimum [52][53][54] .Therefore, 940 nm was selected for this study.

Hardware architecture of niGLUC-2.0v
The hardware design was conceptualized using SW-NIR between 700 and 1300 nm.NIR sensor with an emitter of wavelength 940 nm and a 900-1700 nm detector were chosen to detect the blood glucose molecule.The sensitivity of the sensor is 0.9-0.95amperes/watt.The specificity of the sensor was 0.5 amperes/watt.The range of sensor to detect blood glucose levels is in between 0 to 0.3 mm.Two sites were selected for the detection of blood glucose levels: the finger and the wrist.A block diagram of the prototype is shown in Fig. 2. Aluminum gallium arsenide (GaAlAs) LEDs were chosen because the p-surface was coated with silicon nitrate (SiN 4 O 12 ), which helped to reduce the interference of ambient light and provided better stability at the output.Half angle of the LED was 40°.The LEDs were placed on the same side so that the reflected light was captured at the detector, and thus, a 180° phase shift occurs between them.The distance between the emitter and the detector was 5.5 mm.The niGLUC-2.0voperates with a 5 V power supply and 2 A current.The operating power of the finger sensor is 0.5 W and 0.7 W for the wrist sensor.The current consumed by the finger and wrist sensor was 100 mA.When light from the NIR emitter passes through the blood, the detector detects the reflected light from the blood in the form of a signal 55 .The amplitude of the signal depends on the blood glucose concentration.If the blood glucose concentration is high, the reflected signal is low and vice versa.
A block diagram of the circuit protection and its components is presented in Fig. 3.Various circuit protection mechanisms have been implemented 56,57 .However, in the proposed work, as the circuit works on a 5 V power supply (small voltage application), a voltage regulator is implemented to protect the circuit from overvoltage conditions, voltage spike suppression, and thermal protection.The internal circuitry was protected by resistors and the ground to protect the components.The sensor consists of an emitter and a detector circuit.The reflected signal at the detector was passed through a low-pass filter and then amplified using a power amplifier.The output consists of an amplified analog signal fed to the DAQ.Radiation safety is considered based on the Incoherent Visible and Infrared radiation on Non-Ionizing Radiation Protection (ICNIRP) guidelines that state thermal injury of the cornea in case of direct eye exposure > 1000 s 58 .Nevertheless, in the proposed study, there was no direct eye contact with the sensor, and the exposure to radiation was < 60 s.The components were secured in a Printed Circuit Board (PCB) covered with a black body and a transparent window underneath that allow light to pass from the emitter into the skin.Data acquisition (DAQ) is employed to convert the analog signal into a digital signal, where the voltage values can be viewed using LabVIEW software.The output voltages were received serially in frames.100frames were collected as one corresponding sample.Data analysis is carried out in the sequence of pre-processing of the data and applying a data analytic framework, i.e., employing EDA, data visualization, data integration, and predictive analysis.Predictive analysis was performed by applying ML models in which the blood glucose levels were predicted.

Cost comparison of niGLUC-2.0v with commercial devices
The cost of the proposed niGLUC 2.0v device is listed in Table 1.Cost comparisons were performed between invasive, minimally invasive, implantable, and niGLUC-2.0v.It can be inferred from the table that the total cost of pathology lab reports, including Fasting Plasma Glucose (FPG), postprandial (PP), and Glycated Hemoglobin (HbA1C) profiles calculated four times a year at Apollo Hospital, is estimated to be $35 59 .An Accu-check glucometer costs $430, including additional supplies (glucometer, lancet, and strip of 50 counts) purchased four times a year 60 .The total cost of a minimally invasive device for one Dexcom G6 transmitter and three sensors was $1060.The sensor and transmitter must be changed for 10 and 90 days 61 .Eversense E3 is an implantable device available with insurance, which costs $675.3 for one-time insertion and sensor removal 62 .In the proposed study, LabVIEW was implemented for the convenience of data collection.Arduino-based open software will be used at final product.niGLUC-2.0vcosts $85.6 as a one-time purchase, a non-invasive and portable device that does not change the sensors or transmitters.

Development of data analytic framework for niGLUC-2.0v
The flowchart of the proposed work is shown in Fig. 4. The finger and wrist sensors were switched on, and the values were recorded from the sensors.One hundred samples from the sensor were recorded and saved in an Excel file along with physiological details of the patient.A normalization and data analytic framework was applied to the datasets on which ML algorithms were used to accurately predict blood glucose levels.

Overcoming the challenge of skin pigmentation
Three healthy volunteers age group-33-35 with dark, wheatish, and fair skin tones were selected as shown in Fig. 5a-c.Volunteers were asked to fast the previous night.The same quantity of breakfast and lunch were consumed.Fasting and postprandial blood glucose values were recorded invasively using a home monitoring kit and non-invasively using niGLUC-2.0v,as illustrated in Table 2.A total of 44 readings were collected over five days.For comparison, blood glucose levels can be divided into a range of 5, i.e., between 91-95 mg/dL, 121-125 mg/ dL, 126-130 mg/dL, and 131-135 mg/dL.It can be observed from Table 6 that the postprandial values (after breakfast) obtained invasively for volunteers 1 and 3 were the same, i.e., 128 mg/dL.In contrast, a difference can  be noted in the corresponding voltages, i.e., 0.15548 V for volunteer 1 and 0.164059 V for volunteer 2. A variance of 0.008579 V was observed for the wrist.A significant difference can be observed between volunteer 2 with 0.055806 V at 126 mg/dL when compared with volunteers 1 and volunteer 3, with the blood glucose level falling within the same interval, i.e., 128 mg/dL.A difference of 0.099674 V was observed.Although the differences among the three healthy volunteers were negligible skin color was a significant factor of interference when a large dataset was considered with different age groups, sex, BMI, and other physiological factors.There is a need for NIR sensors that consider skin color interference for accurate blood glucose predictions.
The current work proposes a novel interval-based adjustment factor, as detailed in Algorithm 1 in Table 3, for handling skin color pigmentation.The dataset was arranged in ascending order of the reference blood glucose values.The dataset was divided into dark, wheat, and fair skin colors.The invasive blood glucose values from all skin colors and their respective niGLUC-2.0vvalues were grouped at an interval of 5.The variance of the three skin tones falling within the same interval was calculated using Eq. ( 2).The variance was evaluated to be 0.000375 for G 3 (T).The correction factor with respect to variance was evaluated at each interval, as mentioned in Eq. (3).A correction factor of 0.019366 was obtained for G 3 (T).The adjustment factor was calculated from the invasive blood glucose level and correction factor mentioned in Eq. ( 4).It was calculated for all voltages falling within the interval.Similarly, the adjustment factor was calculated for all intervals.An adjustment model was created and fed into the data analytic framework to predict blood glucose levels.
(5) Exploratory data analysis for niGLUC-2.0V EDA is implemented to understand the pattern in data visualization to rectify errors, anomalies, and outliers that may take place during data collection.
It is used to obtain the desired level of prediction by estimating the parameters and margins of error from existing data 63 .A data integration method was employed to preprocess the dataset.The output of Algorithm 1 is the adjustment factor at different intervals multiplied by the blood glucose levels.The dataset with a multiplied adjustment factor of voltages from niGLUC-2.0vwas fed into the data analytic framework for accurate blood glucose prediction.
The step-by-step flow of the data integration method is presented in Algorithm 2 in Table 4.The dataset was arranged in ascending order of the reference blood glucose levels obtained invasively.The reference blood glucose levels were divided into intervals of 5, i.e., from 81-85 mg/dL to 486-500 mg/dL.For ease of calculation, the reference blood glucose values in (mg/dL) were converted into millimoles (mmol).The EDA was applied to the dataset.At this step, the reference blood glucose values within an interval of five were averaged.The dataset was updated by assigning the averaged reference values to the respective hardware values in the interval.For example, for a blood glucose interval of (4.22-4.44)mmol, the average value is 4.33 mmol, as illustrated in Table 4.The average blood glucose value was assigned to the hardware values of the blood glucose level.Every hardware-generated value was assigned to the averaged reference value within that particular range.The process was repeated at five intervals for invasively obtained blood glucose levels.A new dataset was created with a column of average blood glucose values, where predictive analysis was applied, as discussed in the next section.www.nature.com/scientificreports/

Predictive analysis
This section presents an exploration of different ML algorithms and evaluation metrics to analyze the performance of the sensor system.

Choosing the algorithms
The proposed work consists of a dataset with continuous variables for prediction implying a regression problem.Predictive models are selected by comparing and analyzing various regression algorithms from the literature.
From the profound literature that discussed on the rationale behind the selection of regression algorithms, it was reported that the performance of predictive models depends on the methodologies implemented, the dataset created 50,[64][65][66][67][68][69] , the size and heterogeneity of the dataset 70 .Regression algorithms differ in their principles of operation, advantages, and limitations.In LR, the algorithm determines the best-fitting line to minimize the difference between the independent and dependent variables.Although its simplicity is advantageous, outliers, and nonlinear patterns cannot be captured which is a limitation 71,72 .PR, which is an extension to LR, models the variables as an nth-degree polynomial function to handle complex patterns; however, its major limitation is its susceptibility to overfitting at higher polynomial degrees 71,72 .However, Lasso CV which is a technique of LR was chosen because it minimizes the residual sum of squares (RSS) and adds regularization.The regularization parameter can discard important features when coefficients shrink to zero.The advantage of Lasso CV over LR and PR is its ability to automatically select features, leading to a sparse model and handling multicollinearity in the dataset [71][72][73] .RF works on the principle of aggregating the predictions from multiple decision trees and has the advantage of handling high-dimensional data and overfitting.The limitation is that hyperparameter tuning is required because it is less interpretable than the individual decision trees 71,73 .RR works on the principle  of penalizing the squared values of regression coefficients by not shrinking the parameters exactly to zero and handling multicollinearity, which is an advantage.The limitation of this method is the inability to perform automatic feature selection and handle sparse data 72,73 .In k-NN, the dependent variable is predicted by averaging the values of the nearest neighbors to k.Although it has the advantage of simplicity and ability to handle complex patterns, it requires scaling of features, which is a limitation, thus increasing the computational overload 71 .DT operates on the principle of partitioning the tree into branches based on feature values corresponding to a decision rule.It is simple and automatically performs feature selection which is advantageous 71,73 .However, overfitting is a disadvantage.Ensemble learning models, that is bagging and boosting on k-NN and DT, have been explored in the literature.Bagging works on the principle of model learning independently of each other in parallel and aggregating to determine the model average.Boosting works on the principle of sequential and adaptive learning to improve the model prediction of the learning algorithm.The main advantage of applying the bagging algorithm to k-NN and DT is its ability to reduce variance and overfitting, but computational complexity poses a limitation.However, applying a boosting algorithm to k-NN and DT limits the bias and variance, but is prone to overfitting because of weak learners, which is a limitation 74 .In contrast, the NN model works on the principle of learning complex patterns and relationships from interconnected neurons with the advantage of high scalability and flexibility posing an advantage.The disadvantage of the NN model is the requirement of a large amount of data for training and the computational overload 71,72 .In the proposed work, LR is applied to determine the nature of the dataset, and based on its complexity of nonlinearity, PR is applied to determine the relationship between the variables.Lasso CV and RF were chosen to avoid overfitting for generalization and regularization and to improve the model performance.As the dataset was multicollinear, the RR was selected.Because the dataset was nonlinear and complex, k-NN and DT were selected.Bagging and boosting algorithms were selected to analyze the performance of the models.Therefore, in this proposed work, LR, PR, Lasso CV, RF, RR, k-NN, k-NN Bagging and k-NN Boosting, DT, DT-Bagging, DT-Boosting, and NN were applied to analyze the best algorithm for real-time collected datasets.

Evaluation metrics for validating niGLUC-2.0v
Evaluation metrics are required to measure and build a generalized model.In the proposed study, the MAE, MSE, and r2score were evaluated for the optimized prediction of the glucose concentration, as detailed in Eqs. ( 8)-( 10):

Experimental design and data collection
This section presents the experimental design and data collection procedure.

Healthcare data standards and data characteristics
Standard healthcare precautions are taken to reduce the risk of bloodborne or pathogen transmission from recognized and unrecognized sources.Hand hygiene, respiration, and cough etiquette were strictly followed.
The selection criteria of the volunteers are presented in Table 5.
The existing literature 24,29,32 considered > 100 patients.However, the sensor was not tested above 200 mg/ dL 24 , and on a similar number of volunteers 32 , whereas in 29 , the inclusion criteria were anyone > 18 years who had volunteers with chronic kidney disease, which limits the analysis to the possibility of variance and leads to bias.Studies where < 100 patients were considered included only healthy participants 28,31 , diabetic, and nondiabetic 26,27 and few did not provide any information regarding the volunteers 25,30,33 .Volunteer demographics play a significant role in the validation and sensitivity of the sensors.It helps augment the quality of care by detecting variances in treatment and ensuring that competent care is provided 75 .A total of 101 patients were considered in the proposed study, with an age range of 25 to 78 years, with 57 male and 44 female volunteers.Volunteers with proper mental health and cognition were recruited as the inclusion criteria, whereas volunteers who had hypoglycemic episodes with unconsciousness, and seizure disorders were excluded from the study due to the possibility of measurement errors.The sensor was validated in all volunteers, where the blood glucose values ranged from 80 to 488 mg/dL, thus validating the robustness of the sensor.
The following steps taken to maintain the standard health protocol and a description of the collected data characteristics are as follows: • Ethical clearance was obtained from SRM Medical College Hospital & Research Centre (ethical clearance number 8274/IEC/2022).• Relevant guidelines and regulations were implemented for all methods.
• All experimental protocols were approved by the SRM Medical College Hospital and Research Center.
• A doctor from the SRM Medical College Hospital and Research Centre was involved in the current study.
• Informed Consent was obtained from each volunteer.
• Particulars of volunteers, such as, name, age, sex, details of the meal taken, the time between reading and meal were recorded, if an individual had diabetes, if the volunteer was on any medications, physical activity in daily life, height, weight, sleep status, stress, SP02, hair on the wrist and any other health complications were recorded.• The finger of the volunteer was cleaned with isopropyl alcohol before reading the invasive sample.
• A new set of lancets and strips were used for each sample.www.nature.com/scientificreports/ • The finger of the volunteer is pricked from the lancet.The drop of blood was placed on the strip, which was then inserted into the glucometer of the invasive device.The blood glucose values of the device were noted.The skin surface was cleaned again with an isopropyl alcohol solution and cotton.• A volunteer is explained about the noninvasive method of obtaining blood glucose values from the hardware.
The volunteer was asked to insert their fingers into the wearable prototype for the measurement of blood glucose values.Wrist wearables were tied to the wrist of the volunteer to obtain blood glucose values from the wrist.
The finger and wrist sensors of niGLUC-2.0vare shown in Fig. 6a.The finger of the volunteer was placed on the hardware, as shown in Fig. 6b.Blood glucose levels were measured invasively using the finger prick method as a reference value.Fasting, postprandial, and random blood tests were performed using a finger prick to determine real-time blood glucose values.Similarly, the wrist sensor was worn by a volunteer, as shown in Fig. 6c.The collected data were visualized using LabVIEW software.A total of 100 data points were recorded in an Excel sheet for a single sample.To validate the proposed hardware, blood glucose measurements from the designed noninvasive hardware were compared with the blood glucose values of the invasive method blood glucose values, i.e., reference values.The finger and wrist prototype of niGLUC-2.0vwas tested in 101 diabetic and prediabetic individuals.Fasting, postprandial, and random blood glucose levels were collected from males and females aged 20-90 years.The baseline data collection with samples and sex distribution of the volunteers are listed in Table 6.

Normalization
Normalization was applied to remove the effect of the dark current.This is detailed in Eq. ( 8).
• The dark current value DC of the photodiode was noted by switching off niGLUC-2.0v.

Testing the accuracy of niGLUC-2.0v for variation in the ambiance light
Light sources may interfere with sensor accuracy 76 .niGLUC-2.0vwas tested in volunteers with and without diabetes.Random glucose levels in both volunteers were measured invasively and non-invasively at regular intervals.As shown in Table 7, in the presence and absence of ambient light, the noninvasive blood glucose values were close to the reference blood glucose levels obtained invasively.

Handling other ambiguities
The measurements with the finger sensor and wrist sensor prototype were performed in a stable state in the sitting position of the volunteer to minimize the effect of motion artifacts.The measurements were performed with and without pressure on the finger and wrist sensor prototypes.Two datasets were created in this study.The data analytic framework was applied to the dataset, where no difference was observed between the reference and predicted glucose levels, as illustrated in Table 8.It was observed that the pressure did not influence the device because the surface of the skin had no direct contact with the sensor, was covered with a transparent surface, and was firmly packed.The reliability of the device was tested at regular intervals in different patients.www.nature.com/scientificreports/

Results and discussion
The current section discusses the results and discussion of the predictive analysis, Bland Altman analysis, Clarke error grid analysis, statistical analysis and a comparison of the current work with the present literature.Regression models discussed in the Methods section were applied to niGLUC-2.0v.Ten input features, i.e., non-invasive blood glucose value, age, sex, body mass index (BMI), details of the meal taken, if an individual is diabetic, sleep status, stress, SP02, and hair on the wrist, were considered as inputs of the model.www.nature.com/scientificreports/Predictive analysis on niGLUC-2.0v In this section, predictive analysis is applied to niGLUC-2.0v.LR, PR, RF, Lasso CV, RR, k-NN, k-NN bagging, k-NN boosting, DT, DT-bagging, DT-boosting, and NN are applied to the computational model to obtain optimized regression method for precise measurement of predicting blood glucose.The datasets on which the ML algorithms were applied were the finger and wrist obtained dataset, normalization applied dataset, and data analytic framework applied dataset.The calibration and comparative results of all ML models are presented in Table 9.The shaded row in the table represents the performance of the respective ML models among all ML algorithms.The datasets of the finger sensor performed well with the RR.The data analytic framework applied the dataset performed with the best accuracy, achieving an MAE of 0.15, MSE of 0.2287, and r2_score of 0.9902.Similarly, the wrist sensor prototype performed well with RR, whereas on the normalization applied, the wrist sensor performed best with the Lasso CV regression model.The wrist sensor performed best on the data analytic framework applied dataset, with an MAE of 0.66, MSE of 0.006, and r2_score of 0.9996.The comparison analysis of ML algorithms and the niGLUC-2.0vsensors determines the reliability of the prototype based on the following findings: (i)The data analytic framework applied dataset performed best when compared with the non-data analytic framework applied dataset for LR, PR, Lasso-CV, k-NN, k-NN bagging, DT, DT-bagging and DT-boosting ML models.(ii) It can be inferred from the comparison of sensor and ML models that the finger and wrist sensor of the niGLUC-2.0vprototype performed best with the data analytic framework.(iii) The attempt to remove the dark current through normalization was overcome by the proposed data analytic framework, which provided the best accuracy compared with the normalization-applied dataset.Therefore, the proposed data analytic framework performed best with RR regression in the presence of skin color variation, finger pressure, and ambient light.

Ridge regression for blood glucose prediction in ni-GLUC-2.0v
RR performed best, with the highest accuracy for the developed model, as reported in Table 9. RR is a modeltuning method that is implemented to analyze multiple multicollinear regression data.Multicollinearity occurs when a high correlation exists between independent variables, thereby raising the issue of high variance.Large variances deviate from the predicted value to the reference value, thus increasing loss.RR, which is a regularization technique, was applied by adding a penalty term to the loss function.The penalty is equal to the square of the magnitude of the coefficients.The RR minimizes the error by adding a degree of bias to the regression estimates.The challenge thrown by multicollinearity is reduced by adding a shrinkage parameter .
The RR derived in Eq. ( 11) has two components.The former component represents the least-square term, whereas the latter represents the penalty term added to the least-square term to attain a low variance.

Validation of niGLUC-2.0v
The validation of the proposed model, performance metrics, and visualizations through graphs are presented.A plot of the reference and predicted blood glucose levels on the proposed data analytic framework for the finger sensor is depicted in Fig. 7.The performance of niGLUC-2.0v is shown in Fig. 7a for the finger and Fig. 7b for the wrist sensor prototype.It can be inferred that the data points were nearest to the trend line, defining the correlation between the reference and predicted blood glucose levels.The X-axis represents the reference blood glucose levels, and the Y-axis represents the predicted blood glucose levels in mmol.The red line in the graph represents the prediction by RR.It can be inferred that the prediction and reference mmol values were closer, thus determining the good prediction accuracy in prediction.
The niGLUC-2.0vhardware was validated by performing error analysis.An error analysis was performed by evaluating the MAE.The model was tested using a new dataset, as presented in Tables 10 and 11.It can be inferred from Table 10 that the maximum error obtained was 1.92 mg/dL, and the minimum error obtained was − 2.47 mg/dL for the finger sensor.The MAE obtained from 20 finger sensor data measurements was 0.15.Similarly, from Table 11, the maximum error obtained was 2.63 mg/dL, and the minimum error obtained was − 0.02 mg/dL for the wrist sensor.The MAE obtained from the 20 wrist sensor data measurements was 0.068.It can be inferred that the training and testing MAE are the same for the finger and wrist sensor prototypes.The data analytic framework in niGLUC-2.0vhas improved the performance of the model on the finger sensor with an accuracy of 99.02%, with MAE of 0.15 and MSE of 0.22 whereas, on the wrist sensor, the accuracy obtained was 99.96% with MAE of 0.06 and MSE of 0.006.The accuracy of both devices was within the clinically acceptable range.Therefore, the device developed is suitable for medical applications.

Bland-Altman analysis
Bland-Altman analysis was used to analyze the difference between the predicted blood glucose levels and reference blood glucose levels.The limits of agreement (LOA) were at ± 1.96 Standard Deviations (SD) from the mean difference 77 .The Bland-Altman plot is illustrated in Fig. 8.The X-axis represents the mean of the reference and predicted blood glucose levels, whereas the Y-axis represents the difference between the reference and predicted blood glucose levels.It can be observed from Fig. 8a for the finger sensor, the mean difference/bias of blood glucose level was at 0.035 and 95% confidence interval between the upper and lower limits of agreement between + 3.7 and − 3.6.Similarly, for the wrist sensor, as depicted in Fig. 8b, the mean difference/bias of blood   glucose level was found at − 0.7 and the 95% confidence interval lying between an upper and lower LOA was between + 1.6 and − 3.0, indicating a strong correlation between the reference and predicted blood glucose levels.

Clarke error grid analysis
CEG is an essential tool for evaluating the clinical accuracy of glucose monitoring devices 78 .Analysis was performed between the reference and predicted blood glucose levels.It can be observed from Fig. 9a and b that all the values fall under zone A of the grid which implies a high clinical significance of the sensor for its usage in the medical field for effective diabetic management.

Statistical analysis
The data is subjected to statistical analysis where a paired t-test is carried out as the measurements were taken for the same subjects, i.e., between the reference and predicted blood glucose levels 79 .IBM SPSS software was used in the proposed work, where a null hypothesis and alternate hypothesis are presented in Eqs. ( 12) and ( 13) respectively.It was observed that the data were normal and the variances of differences were equal; therefore, no correction was needed.Paired t-test was applied to the dataset where p < 0.05 is set 50,64 .It can be observed from Table 12 that v0.001, and df = 100 were obtained with t-value = 0.59 for the finger and 0.56 for the wrist sensor   leading to the acceptance of H 0 ,where it was concluded that there was no difference between the reference and predicted blood glucose levels.
Null hypothesis ( H 0 ): There is no significant difference between reference and predicted blood glucose levels.
Alternate hypothesis ( H 1 ): There is a significant difference between the reference and predicted blood glucose levels.
For finger sensor 78,79 , As t calculated < t critical , the H 0 is accepted.Similarly for the wrist sensor 79,80 , As t calculated < t critical , the H 0 is accepted.

Comparison of niGLUC-2.0v with existing literature
The performance of the proposed device, i.e., niGLUC-2.0v, is compared with previous non-invasive approaches in Table 13.The proposed device was found to have greater accuracy with an R2_SCORE of 99.02%, MAE of 0.15, and MSE of 0.22, whereas R2_SCORE of 99.96%, MAE of 0.06, and MSE of 0.006 were obtained for the wrist sensor.The proposed sensor had the highest detection limit of 80-488 mg/dL compared to other studies.The integration of AI and Data Science with NIR technology has advanced other studies by accurately predicting the of blood glucose levels.The results from ridge regression, linear regression plot, Bland-Altman analysis, and CEG depict the high performance of both finger and wrist sensors.
The data analytic framework proposed in the current study provided the best accuracy in under the presence of ambiguities when compared to the current literature, with 99.02% for in the finger and 99.96% for in the wrist sensor.The statistical analysis with p < 0.05 strengthens the significance on the achieved accuracy which is not achieved in other works.The 100% data points of blood glucose values falling under zone A of CEG proved to   and was validated by invasively obtained blood glucose levels.These results suggest that niGLUC-2.0vhas the potential to accurately predict blood glucose levels and may be beneficial for better diabetic management.The proposed work has certain strengths: (i) The model was developed by considering the effects of variations in skin color, ambient light, and pressure on the device.Care has been taken to avoid motion artifacts by obtaining the measurements in a stable state.(ii) In this study, many volunteers (101), including diabetic and non-diabetic volunteers of all ages (20-90 years), measured blood glucose values invasively and non-invasively.The developed sensor system was validated on diabetic and non-diabetic volunteers by random sampling, i.e., fasting, postprandial, and random testing.(iii) Ambiguities faced are handled in niGLUC-2.0v.The limitation of the proposed work is that the device can only be tested in the stable state of a volunteer.The device was portable and not wearable.Future work is aimed at creating a miniaturized wearable version of the proposed system, that can be tested in motion with good accuracy.

Figure 2 .
Figure 2. Block diagram of the proposed sensor system.

Figure 3 .
Figure 3. Block diagram of circuit protection and components.

Figure 4 .
Figure 4. Flow chart of the proposed work.

Figure 6 .
Figure 6.The hardware setup of the niGLUC-2.0v:(a) Finger and wrist sensor; (b) Measurements from finger sensor; (c) Measurements from wrist sensor.

Figure 8 .
Figure 8. Bland Altman plot between reference and predicted blood glucose levels.(a) Finger sensor (b) Wrist sensor.

Figure 9 .
Figure 9. Clarke error grid analysis between reference and predicted blood glucose levels.(a) Finger sensor (b) wrist sensor.

Table 4 .
Exploratory data analysis on the dataset by setting a threshold for distribution. Inputs:

Table 7 .
Validation in the presence of ambient light ON and OFF condition.Vol: Volunteer; mmol: millimollecule.

Table 8 .
Validation in the presence of pressure on the sensor.Vol: Volunteer; mmol: millimollecule;

Table 12 .
Paired t-test between reference and predicted blood glucose levels for finger and wrist sensor.df: degrees of freedom;